Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
2.
Methods Mol Biol ; 2212: 17-35, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33733347

RESUMO

We present SNPInt-GPU, a software providing several methods for statistical epistasis testing. SNPInt-GPU supports GPU acceleration using the Nvidia CUDA framework, but can also be used without GPU hardware. The software implements logistic regression (as in PLINK epistasis testing), BOOST, log-linear regression, mutual information (MI), and information gain (IG) for pairwise testing as well as mutual information and information gain for third-order tests. Optionally, r2 scores for testing for linkage disequilibrium (LD) can be calculated on-the-fly. SNPInt-GPU is publicly available at GitHub. The software requires a Linux-based operating system and CUDA libraries. This chapter describes detailed installation and usage instructions as well as examples for basic preliminary quality control and analysis of results.


Assuntos
Algoritmos , Curadoria de Dados/estatística & dados numéricos , Epistasia Genética , Software , Entropia , Humanos , Desequilíbrio de Ligação , Modelos Logísticos , Controle de Qualidade
3.
Nucleic Acids Res ; 49(D1): D1534-D1540, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33166392

RESUMO

Since the outbreak of the current pandemic in 2020, there has been a rapid growth of published articles on COVID-19 and SARS-CoV-2, with about 10,000 new articles added each month. This is causing an increasingly serious information overload, making it difficult for scientists, healthcare professionals and the general public to remain up to date on the latest SARS-CoV-2 and COVID-19 research. Hence, we developed LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/), a curated literature hub, to track up-to-date scientific information in PubMed. LitCovid is updated daily with newly identified relevant articles organized into curated categories. To support manual curation, advanced machine-learning and deep-learning algorithms have been developed, evaluated and integrated into the curation workflow. To the best of our knowledge, LitCovid is the first-of-its-kind COVID-19-specific literature resource, with all of its collected articles and curated data freely available. Since its release, LitCovid has been widely used, with millions of accesses by users worldwide for various information needs, such as evidence synthesis, drug discovery and text and data mining, among others.


Assuntos
COVID-19/prevenção & controle , Curadoria de Dados/estatística & dados numéricos , Mineração de Dados/estatística & dados numéricos , Bases de Dados Factuais , PubMed/estatística & dados numéricos , SARS-CoV-2/isolamento & purificação , COVID-19/epidemiologia , COVID-19/virologia , Curadoria de Dados/métodos , Mineração de Dados/métodos , Humanos , Internet , Aprendizado de Máquina , Pandemias , Publicações/estatística & dados numéricos , SARS-CoV-2/fisiologia
4.
Nucleic Acids Res ; 49(D1): D1507-D1514, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33180112

RESUMO

Europe PMC (https://europepmc.org) is a database of research articles, including peer reviewed full text articles and abstracts, and preprints - all freely available for use via website, APIs and bulk download. This article outlines new developments since 2017 where work has focussed on three key areas: (i) Europe PMC has added to its core content to include life science preprint abstracts and a special collection of full text of COVID-19-related preprints. Europe PMC is unique as an aggregator of biomedical preprints alongside peer-reviewed articles, with over 180 000 preprints available to search. (ii) Europe PMC has significantly expanded its links to content related to the publications, such as links to Unpaywall, providing wider access to full text, preprint peer-review platforms, all major curated data resources in the life sciences, and experimental protocols. The redesigned Europe PMC website features the PubMed abstract and corresponding PMC full text merged into one article page; there is more evident and user-friendly navigation within articles and to related content, plus a figure browse feature. (iii) The expanded annotations platform offers ∼1.3 billion text mined biological terms and concepts sourced from 10 providers and over 40 global data resources.


Assuntos
Disciplinas das Ciências Biológicas/estatística & dados numéricos , COVID-19/prevenção & controle , Curadoria de Dados/estatística & dados numéricos , Mineração de Dados/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , PubMed , SARS-CoV-2/isolamento & purificação , Disciplinas das Ciências Biológicas/métodos , Pesquisa Biomédica/métodos , Pesquisa Biomédica/estatística & dados numéricos , COVID-19/epidemiologia , COVID-19/virologia , Curadoria de Dados/métodos , Mineração de Dados/métodos , Epidemias , Europa (Continente) , Humanos , Internet , SARS-CoV-2/fisiologia
6.
Molecules ; 24(8)2019 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-31018579

RESUMO

The Toxicology in the 21st Century (Tox21) project seeks to develop and test methods for high-throughput examination of the effect certain chemical compounds have on biological systems. Although primary and toxicity assay data were readily available for multiple reporter gene modified cell lines, extensive annotation and curation was required to improve these datasets with respect to how FAIR (Findable, Accessible, Interoperable, and Reusable) they are. In this study, we fully annotated the Tox21 published data with relevant and accepted controlled vocabularies. After removing unreliable data points, we aggregated the results and created three sets of signatures reflecting activity in the reporter gene assays, cytotoxicity, and selective reporter gene activity, respectively. We benchmarked these signatures using the chemical structures of the tested compounds and obtained generally high receiver operating characteristic (ROC) scores, suggesting good quality and utility of these signatures and the underlying data. We analyzed the results to identify promiscuous individual compounds and chemotypes for the three signature categories and interpreted the results to illustrate the utility and re-usability of the datasets. With this study, we aimed to demonstrate the importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data. To improve the data with respect to all FAIR criteria, all assay annotations, cleaned and aggregate datasets, and signatures were made available as standardized dataset packages (Aggregated Tox21 bioactivity data, 2019).


Assuntos
Curadoria de Dados/estatística & dados numéricos , Regulação da Expressão Gênica/efeitos dos fármacos , Metadados/normas , Farmacogenética/métodos , Toxicologia/métodos , Xenobióticos/toxicidade , Benchmarking , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Genes Reporter , Ensaios de Triagem em Larga Escala/normas , Humanos , Xenobióticos/química , Xenobióticos/classificação
7.
Genet Epidemiol ; 43(4): 356-364, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30657194

RESUMO

When interpreting genome-wide association peaks, it is common to annotate each peak by searching for genes with plausible relationships to the trait. However, "all that glitters is not gold"-one might interpret apparent patterns in the data as plausible even when the peak is a false positive. Accordingly, we sought to see how human annotators interpreted association results containing a mixture of peaks from both the original trait and a genetically uncorrelated "synthetic" trait. Two of us prepared a mix of original and synthetic peaks of three significance categories from five different scans along with relevant literature search results and then we all annotated these regions. Three annotators also scored the strength of evidence connecting each peak to the scanned trait and the likelihood of further studying that region. While annotators found original peaks to have stronger evidence (p Bonferroni = 0.017) and higher likelihood of further study ( p Bonferroni = 0.006) than synthetic peaks, annotators often made convincing connections between the synthetic peaks and the original trait, finding these connections 55% of the time. These results show that it is not difficult for annotators to make convincing connections between synthetic association signals and genes found in those regions.


Assuntos
Curadoria de Dados , Interpretação Estatística de Dados , Reações Falso-Positivas , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Curadoria de Dados/métodos , Curadoria de Dados/normas , Curadoria de Dados/estatística & dados numéricos , Enganação , Estudo de Associação Genômica Ampla/normas , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
8.
PLoS Comput Biol ; 14(8): e1006390, 2018 08.
Artigo em Inglês | MEDLINE | ID: mdl-30102703

RESUMO

Manually curating biomedical knowledge from publications is necessary to build a knowledge based service that provides highly precise and organized information to users. The process of retrieving relevant publications for curation, which is also known as document triage, is usually carried out by querying and reading articles in PubMed. However, this query-based method often obtains unsatisfactory precision and recall on the retrieved results, and it is difficult to manually generate optimal queries. To address this, we propose a machine-learning assisted triage method. We collect previously curated publications from two databases UniProtKB/Swiss-Prot and the NHGRI-EBI GWAS Catalog, and used them as a gold-standard dataset for training deep learning models based on convolutional neural networks. We then use the trained models to classify and rank new publications for curation. For evaluation, we apply our method to the real-world manual curation process of UniProtKB/Swiss-Prot and the GWAS Catalog. We demonstrate that our machine-assisted triage method outperforms the current query-based triage methods, improves efficiency, and enriches curated content. Our method achieves a precision 1.81 and 2.99 times higher than that obtained by the current query-based triage methods of UniProtKB/Swiss-Prot and the GWAS Catalog, respectively, without compromising recall. In fact, our method retrieves many additional relevant publications that the query-based method of UniProtKB/Swiss-Prot could not find. As these results show, our machine learning-based method can make the triage process more efficient and is being implemented in production so that human curators can focus on more challenging tasks to improve the quality of knowledge bases.


Assuntos
Curadoria de Dados/métodos , Armazenamento e Recuperação da Informação/métodos , Curadoria de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Bases de Dados de Proteínas , Aprendizado Profundo , Genômica , Bases de Conhecimento , Aprendizado de Máquina , Publicações
9.
Bioinformatics ; 33(21): 3454-3460, 2017 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-29036270

RESUMO

MOTIVATION: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches. RESULTS: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable. AVAILABILITY AND IMPLEMENTATION: UniProt is freely available at http://www.uniprot.org/. CONTACT: sylvain.poux@sib.swiss. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Curadoria de Dados , Bases de Dados de Proteínas , Curadoria de Dados/estatística & dados numéricos , Mineração de Dados , Bases de Dados de Proteínas/estatística & dados numéricos , Humanos , Bases de Conhecimento , PubMed/estatística & dados numéricos , Literatura de Revisão como Assunto , Estatística como Assunto
10.
Appl Neuropsychol Adult ; 22(6): 399-406, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25785544

RESUMO

Davis, Axelrod, McHugh, Hanks, and Millis (2013) documented that in a battery of 25 tests, producing 15, 10, and 5 abnormal scores at 1, 1.5, and 2 standard deviations below the norm-referenced mean, respectively, and an overall test battery mean (OTBM) of T ≤ 38 accurately identifies performance invalidity. However, generalizability of these findings to other samples and test batteries remains unclear. This study evaluated the use of abnormal scores and the OTBM as performance validity measures in a different sample that was administered a 25-test battery that minimally overlapped with Davis et al.'s test battery. Archival analysis of 48 examinees with mild traumatic brain injury seen for medico-legal purposes was conducted. Producing 18 or more, 7 or more, and 5 or more abnormal scores at 1, 1.5, and 2 standard deviations below the norm-referenced mean, respectively, and an OTBM of T ≤ 40 most accurately classified examinees; however, using Davis et al.'s proposed cutoffs in the current sample maintained specificity at or near acceptable levels. Due to convergence across studies, producing ≥5 abnormal scores at 2 standard deviations below the norm-referenced mean is the most appropriate cutoff for clinical implementation; however, for batteries consisting of a different quantity of tests than 25, an OTBM of T ≤ 38 is more appropriate.


Assuntos
Lesões Encefálicas/complicações , Transtornos Cognitivos/diagnóstico , Transtornos Cognitivos/etiologia , Testes Neuropsicológicos , Adulto , Curadoria de Dados/estatística & dados numéricos , Avaliação da Deficiência , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Psicometria , Valores de Referência , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
12.
Stud Health Technol Inform ; 205: 116-20, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25160157

RESUMO

Evaluation and validation have become a crucial problem for the development of semantic resources. We developed Ci4SeR, a Graphical User Interface to optimize the curation work (not taking into account structural aspects), suitable for any type of resource with lightweight description logic. We tested it on OntoADR, an ontology of adverse drug reactions. A single curator has reviewed 326 terms (1020 axioms) in an estimated time of 120 hours (2.71 concepts and 8.5 axioms reviewed per hour) and added 1874 new axioms (15.6 axioms per hour). Compared with previous manual endeavours, the interface allows increasing the speed-rate of reviewed concepts by 68% and axiom addition by 486%. A wider use of Ci4SeR would help semantic resources curation and improve completeness of knowledge modelling.


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/estatística & dados numéricos , Curadoria de Dados/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Registro Médico Coordenado/métodos , Semântica , Software , Interface Usuário-Computador , Curadoria de Dados/métodos , França , Armazenamento e Recuperação da Informação/métodos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Processamento de Linguagem Natural , Design de Software , Vocabulário Controlado
13.
Stud Health Technol Inform ; 205: 599-603, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25160256

RESUMO

Clinicians need historical information that does not change over time, as well as other information from the notes of others to inform their documentation--to save time, they cut and paste since that is a feature in many conventional EHRs. Copy and paste is a solution to clinicians' needs that has associated downsides including errors. As part of a study of clinicians using an innovative system which gives them complete control over information selection and arrangement, two used the process of note splitting to meet needs that are sometimes solved through cut and paste; four others used text insertion (partial note sections) to address related needs. The purpose of this study is to enhance understanding of the note splitting and text insertion phenomena by describing the processes, the resulting creations, and the associated clinician rationales. Mixed methods included a thinkaloud protocol and analysis of user interface creations and time sequences.


Assuntos
Documentação/métodos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Determinação de Necessidades de Cuidados de Saúde , Revisão da Utilização de Recursos de Saúde , Processamento de Texto/estatística & dados numéricos , Redação , Curadoria de Dados/métodos , Curadoria de Dados/estatística & dados numéricos , Documentação/estatística & dados numéricos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Padrões de Prática Médica/estatística & dados numéricos , Interface Usuário-Computador
14.
Comput Methods Programs Biomed ; 117(2): 104-13, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25168774

RESUMO

Studies on health domain have shown that health websites provide imperfect information and give recommendations which are not up to date with the recent literature even when their last modified dates are quite recent. In this paper, we propose a framework which assesses the timeliness of the content of health websites automatically by evidence based medicine. Our aim is to assess the accordance of website contents with the current literature and information timeliness disregarding the update time stated on the websites. The proposed method is based on automatic term recognition, relevance feedback and information retrieval techniques in order to generate time-aware structured queries. We tested the framework on diabetes health web sites which were archived between 2006 and 2013 by Archive-it using American Diabetes Association's (ADA) guidelines. The results showed that the proposed framework achieves 65% and 77% accuracy in detecting the timeliness of the web content according to years and pre-determined time intervals respectively. Information seekers and web site owners may benefit from the proposed framework in finding relevant and up-to-date diabetes web sites.


Assuntos
Ensaios Clínicos como Assunto/estatística & dados numéricos , Curadoria de Dados/estatística & dados numéricos , Mineração de Dados/métodos , Diabetes Mellitus , Processamento de Linguagem Natural , Publicações Periódicas como Assunto/estatística & dados numéricos , Mídias Sociais/estatística & dados numéricos , Ensaios Clínicos como Assunto/classificação , Curadoria de Dados/classificação , Medicina Baseada em Evidências , Humanos , Publicações Periódicas como Assunto/classificação , Mídias Sociais/classificação , Fatores de Tempo
17.
Stud Health Technol Inform ; 192: 1001, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23920775

RESUMO

Formats for data storage in personal computers vary according to manufacturer and models for personal health-monitoring devices such as blood-pressure and body-composition meters. In contrast, the data format of images from digital cameras is unified into a JPEG format with an Exif area and is already familiar to many users. We have devised a method that can contain health data as a JPEG file. Health data is stored in the Exif area in JPEG in a HL7 format. There is, however, a capacity limit of 64 KB for the Exif area. The aim of this study is to examine how much health data can actually be stored in the Exif area. We found that even with combined data from multiple devices, it was possible to store over a month of health data in a JPEG file, and using multiple JPEG files simply overcomes this limit. We believe that this method will help people to more easily handle health data regardless of the various device modelsthey use.


Assuntos
Gráficos por Computador/estatística & dados numéricos , Gráficos por Computador/normas , Compressão de Dados/estatística & dados numéricos , Compressão de Dados/normas , Registros Eletrônicos de Saúde/normas , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Armazenamento e Recuperação da Informação/normas , Curadoria de Dados/normas , Curadoria de Dados/estatística & dados numéricos , Nível Sete de Saúde/normas
18.
Stud Health Technol Inform ; 192: 1021, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23920795

RESUMO

Standard Japanese electronic medical record (EMR) systems are associated with major shortcomings. For example, they do not assure lifelong readability of records because each document requires its own viewing software program, a system that is difficult to maintain over long periods of time. It can also be difficult for users to comprehend a patient's clinical history because different classes of documents can only be accessed from their own window. To address these problems, we developed a document-based electronic medical record that aggregates all documents for a patient in a PDF or DocuWorks format. We call this system the Document Archiving and Communication System (DACS). There are two types of viewers in the DACS: the Matrix View, which provides a time line of a patient's history, and the Tree View, which stores the documents in hierarchical document classes. We placed 2,734 document classes into 11 categories. A total of 22,3972 documents were entered per month. The frequency of use of the DACS viewer was 268,644 instances per month. The DACS viewer was used to assess a patient's clinical history.


Assuntos
Curadoria de Dados/estatística & dados numéricos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Sistemas de Comunicação no Hospital/estatística & dados numéricos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Uso Significativo/estatística & dados numéricos , Revisão da Utilização de Recursos de Saúde , Japão
19.
Stud Health Technol Inform ; 192: 1196, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23920970

RESUMO

OBJECTIVE: To quantitatively describe (1) differences between search results derived at consecutive time points with the PubMed and OvidSP literature search interfaces over a five day interval, and (2) the migration of citations through different subsets to estimate the timeliness of OvidSP. METHODS: PubMed-Identifiers (PMIDs) of the following subsets were retrieved from PubMed and OvidSP simultaneously (within 8 h) at 11 days in March and April 2010 including 5 consecutive days: as supplied by publisher, in process, PubMed not MEDLINE, and OLDMEDLINE. Search results were compared for difference and intersection sets. The migration of citations on individual level was determined by comparison of corresponding sets over several days. RESULTS: The "in process" set was stable with about 446,000 - 452,000 citations; a small fraction of up to 3 % of the total subsets were in PubMed only and OvidSP only subsets. About 96 % of the ca. 10,500 citations in the OvidSP only subset migrated within 2 days out of the "in process" subset. The database of OvidSP is updated within a period of two days.


Assuntos
Curadoria de Dados/estatística & dados numéricos , Sistemas de Gerenciamento de Base de Dados/estatística & dados numéricos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Processamento de Linguagem Natural , Publicações Periódicas como Assunto/estatística & dados numéricos , PubMed/estatística & dados numéricos , Ferramenta de Busca/estatística & dados numéricos , Indexação e Redação de Resumos , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...